Method and system for training machine learning system

ABSTRACT

The present disclosure provides a method and a system for training a machine learning system. Multiple pieces of sample data are used for training the machine learning system. The method includes acquiring multiple sample sets, each sample set including sample data in a corresponding sampling time period; setting a sampling rate for each sample set according to the corresponding sampling time period; acquiring multiple sample sets sampled according to set sampling rates; determining importance values of the multiple sampled sample sets; correcting each piece of sample data in the multiple sampled sample sets by using a corresponding importance value to obtain corrected sample data; and inputting the corrected sample data into the machine learning system to train the machine learning system.

CROSS-REFERENCE TO RELATED APPLICATION

This application is a continuation of International Application No.PCT/CN2017/073719, filed on Feb. 16, 2017, which is based upon andclaims priority to Chinese Patent Application No. 201610113716.1, filedon Feb. 29, 2016, both of which are incorporated herein by reference intheir entireties.

TECHNICAL FIELD

The present disclosure relates to the field of big data processing, andin particular, to a method and a system for training a machine learningsystem.

BACKGROUND

In the current big data era, it is very easy for an Internet company toacquire hyper-scale data. According to incomplete statistics, Google had3 billion queries/30 billion advertisements every day in 2012, Facebookusers shared 4.3 billion pieces of content every day in 2013, andAlibaba had more than 0.7 billion transactions on the day of DoubleEleven in 2015. These companies use a machine learning system to minedata, including user interests/behaviors/habits, and the like.

A machine learning system is designed as a neural network imitating ahuman brain to predict behaviors of users. A machine learning systemneeds to be trained by using a large scale of data before beinglaunched. However, during the training, a large amount of machineresources must be used to effectively process the large scale of data.For example, advertisement data of Tencent generally amounts topetabytes of data, and more than a thousand machines must be used, whichis a huge cost for most companies.

A common processing manner is reducing the data amount processed by amachine leaning system by means of random sampling, in order to reducethe cost and improve the efficiency of the machine learning system. Therandom sampling refers to discarding samples at a certain probability.For example, a floating number in a range of 0-1 is generated for eachsample, and the sample is discarded if the floating number is greaterthan a threshold. However, the manner of randomly discarding samplesleads to a large amount of useful data being discarded, thus diminishingthe training performance of the machine learning system and reducing theprediction precision.

SUMMARY

In view of the above problems, embodiments of the present disclosure areproposed to provide a method and a system for training a machinelearning system that can address the above problems or at leastpartially solve the above problems.

In accordance with some embodiments of the present disclosure, there isprovided a method for training a machine learning system, where multiplepieces of sample data are used to train the machine learning system. Themethod includes acquiring multiple sample sets. Each sample set of themultiple sample sets includes sample data in a corresponding samplingtime period. The method includes setting a sampling rate for each sampleset according to the corresponding sampling time period. The methodincludes acquiring multiple sample sets sampled according to setsampling rates. The method includes determining importance values of themultiple sampled sample sets. The method includes correcting all piecesof sample data in the multiple sampled sample sets by using theimportance values corresponding to the sampled sample sets to obtaincorrected sample data. The method includes inputting each piece of thecorrected sample data into the machine learning system to train themachine learning system.

In accordance with some embodiments of the present disclosure, there isprovided a system for training a machine learning system, where multiplepieces of sample data are used to train the machine learning system. Thesystem includes one or more memories configured to store executableprogram code and one or more processors configured to read theexecutable program code stored in the one or more memories to cause thesystem to perform a method. The method includes acquiring multiplesample sets, where each sample set of the multiple sample sets includessample data in a corresponding sampling time period. The method includessetting a sampling rate for each sample set according to thecorresponding sampling time period. The method includes acquiringmultiple sample sets sampled according to set sampling rates. The methodincludes determining importance values of the multiple sampled samplesets. The method includes correcting each piece of sample data in themultiple sampled sample sets by using a corresponding importance valueto obtain corrected sample data. The method includes inputting eachpiece of the corrected sample data into the machine learning system totrain the machine learning system.

In accordance with some embodiments of the present disclosure, there isprovided a non-transitory computer-readable storage medium storing a setof instructions that is executable by one or more processors of anelectronic device to cause the electronic device to perform a method.The method includes acquiring multiple sample sets. Each sample set ofthe multiple sample sets includes sample data in a correspondingsampling time period. The method includes setting a sampling rate foreach sample set according to the corresponding sampling time period. Themethod includes acquiring multiple sample sets sampled according to setsampling rates. The method includes determining importance values of themultiple sampled sample sets. The method includes correcting all piecesof sample data in the multiple sampled sample sets by using theimportance values corresponding to the sampled sample sets to obtaincorrected sample data. The method includes inputting each piece of thecorrected sample data into the machine learning system to train themachine learning system.

The embodiments of the present disclosure can have the followingadvantages. The embodiments of the present disclosure disclose a methodand a system for training a machine learning system. Sample data isprocessed before being inputted into the machine learning system. Samplesets divided according to sampling time periods are acquired. A samplingrate of each sample set is set according to the sampling time periods.Sampling is conducted according to sampling rates. Importance values ofthe sampled sample sets are determined. The sample data is corrected byusing the importance values. The sample data is inputted into themachine learning system for training. The adoption rate and theutilization of important data can be guaranteed while the data amountprocessed by the machine learning system is reduced. The impact on thelearning performance of the machine learning system can be reduced whilethe demand for memory resources is lessened.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flowchart of an exemplary method for training a machinelearning system according to some embodiments of the present disclosure.

FIG. 2 is a flowchart of an exemplary method for training a machinelearning system according to some embodiments of the present disclosure.

FIG. 3 is a flowchart of an exemplary method for training a machinelearning system according to some embodiments of the present disclosure.

FIG. 4 is a block diagram of an exemplary system for training a machinelearning system according to some embodiments of the present disclosure.

FIG. 5 is a block diagram of an exemplary system for training a machinelearning system according to some embodiments of the present disclosure.

FIG. 6 is a block diagram of an exemplary system for training a machinelearning system according to some embodiments of the present disclosure.

DETAILED DESCRIPTION

The technical solutions in the embodiments of the present disclosurewill be described below through the accompanying drawings depicting theembodiments of the present disclosure. Apparently, the describedembodiments are merely a part, rather than all, of the embodiments ofthe present disclosure. Based on the embodiments of the presentdisclosure, all other embodiments derived by those of ordinary skill inthe art shall fall within the protection scope of the presentdisclosure.

One of the core ideas of the present disclosure lies in that a methodand a system for training a machine learning system are proposed.Multiple pieces of sample data are used to train the machine learningsystem. The method includes: dividing sample data into multiple samplesets according to sampling time periods of the sample data; setting asampling rate for each sample set according to the sampling time periodcorresponding to each sample set; sampling each sample set according tothe corresponding sampling rate, and modifying an importance valuecorresponding to each sampled sample set; correcting each piece ofsample data by using the importance values, and inputting the correctedsample data into the machine learning system to train the machinelearning system.

According to some embodiments of the present disclosure, there isprovided a method for training a machine learning system. FIG. 1 shows aflowchart of an exemplary method for training a machine learning systemaccording to some embodiments of the present disclosure. The method fortraining a machine learning system provided in these embodiments caninclude steps S101-S106 as follows.

In step S101, multiple sample sets are acquired. Each sample setincludes sample data corresponding to a sampling time period. In thisstep, each piece of sample data is, for example, a vector. A dimensionof the vector is, for example, a sampling time of the sample data. Inthis step, the sampling time of all the sample data may be divided intomultiple sampling time periods. The multiple pieces of sample data aredivided into multiple sample sets according to the sampling timeperiods. Each sample set corresponds to a sampling time period.

For example, a sampling time including all sample data is from January24 to January 29. The sampling time may be divided into multiplesampling time periods, for example, three sampling time periodsincluding January 29, January 27 to January 28, and January 24 toJanuary 26. According to the above three sampling time periods, thesample data is divided into a sample set sampled on January 29, a sampledata set sampled from January 27 to January 28, and a sample set sampledfrom January 24 to January 26. Therefore, each sample set corresponds toa sampling time period.

It is noted that the sampling time periods may be divided according to arule set by a developer or a user, and may be distributed evenly orunevenly. These are not limiting in the present disclosure.

In step S102, a sampling rate is set for each sample set according tothe sampling time period corresponding to each sample set. In this step,a sampling rate of each sample set may be set according to acorresponding sampling time period. For example, the sampling rate maybe set according to a principle that a sample set having a more recentsampling time period corresponds to a higher sampling rate. That is, thesampling rate is set higher for a sample set corresponding to a latersampling time period than the sampling rate of a sample setcorresponding to an earlier sampling time period. For example, in theabove example, the sampling rate of the sample set corresponding to thesample data sampled on January 29 may be set to 1.0. The sampling rateof the sample set corresponding to the sample data sampled from January27 to January 28 may be set to 0.5. The sampling rate of the sample setcorresponding to the sample data sampled from January 24 to January 26may be set to 0.1.

In step S103, multiple sample sets sampled according to sampling ratesare acquired. In this step, sample data in each sample set may besampled according to the sampling rate set in the above step. Forexample, a sample set includes 1000 pieces of sample data, and asampling rate is 0.1. Then, the number of pieces of sampling dataincluded in the sampled sample set is 1000*0.1=100. After the sampling,there are 100 pieces of sample data in the sample set. A setcorresponding to the 100 pieces of sample data may be referred to as asampled sample set.

In step S104, importance values of the multiple sampled sample sets aredetermined respectively. In some embodiments, the importance value maybe a coefficient set manually or by a machine algorithm. The importancevalue corresponding to each sampled sample set may be set manually orset by a machine according to a certain rule. In this step, a newimportance value may be set on the basis of the original importancevalue of the sample set.

In step S105, pieces of sample data in the multiple sampled sample setsare corrected by using the importance values to obtain corrected sampledata. In this step, each piece of sample data in a sampled sample setmay be corrected by using an importance value corresponding to thesampled sample set to obtain corrected sample data.

Correcting each piece of sample data by using the importance value mayinvolve multiplying each feature dimension of each vector by theimportance value, such that the vector is magnified proportionally toobtain the corrected sample data.

For example, an original or default importance value of a sample set is1 and can be corrected to 2 in this step. Therefore, a piece of sampledata originally being a (1, 1, 1, 2, . . . , n) may be corrected to a(2, 2, 2, 4, . . . , 2n), which is corrected sample data.

However, it is appreciated that the importance value is not limited to acoefficient set manually or using a machine algorithm. In someembodiments, there may be various methods for correcting the samples.For example, a mathematical operation may be performed on the sampledata a (1, 1, 1, 2, . . . , n), a1=f(a), and the like. Here, function fmay include a mathematical function such as a geometric multiplicationfunction, an exponential calculation, or the like.

In step S106, each piece of the corrected sample data is input into themachine learning system to train the machine learning system. In thisstep, the corrected sample data may be input into the machine learningsystem to train the machine learning system. During training, thederivative of a loss function is taken to calculate a gradient. Then, aweight close to the optimal solution may be calculated through iterationbased on the gradient, an initial weight, and a set step lengthaccording to the formula “new weight=old weight+step length*gradient.”

In summary, these embodiments disclose a method for training a machinelearning system. The sample data is processed before being input intothe machine learning system. The adoption rate and the utilization ofimportant data can be improved while the data amount is reduced. Thus,the impact on the learning performance of the machine learning systemcan be reduced while the demand for memory resources is lessened.

According to some embodiments of the present disclosure, there isprovided another method for training a machine learning system. FIG. 2shows a flowchart of an exemplary method for training a machine learningsystem according to some embodiments of the present disclosure. Themethod for training a machine learning system provided in theseembodiments can include steps S201-S206.

In step S201, multiple sample sets are acquired. Each sample setincludes sample data corresponding to a sampling time period.

In step S202, a sampling rate is set for each sample set according tothe sampling time period corresponding to each sample set.

In step S203, multiple sample sets sampled according to sampling ratesare acquired.

The above three steps are identical or similar to steps S101, S102, andS103 described above, and are not repeated in detail here.

In step S204, importance values of the multiple sampled sample sets aredetermined respectively.

For example, step S204 may include sub-step S204 a as follows.

In sub-step S204 a, initial importance values of the sampled sample setsare corrected based on corresponding sampling rates to obtain correctedimportance values of the sampled sample sets. A corrected importancevalue can be directly proportional to an initial importance value, andinversely proportional to the sampling rate of the sampled sample set.

In sub-step S204 a, for example, a new importance value may becalculated based on a ratio of the original corresponding importancevalue of the sample set to the sampling rate. For example, an importancevalue of each sample set may be set initially according to the followingformula:Y1=Y/a;wherein Y1 is a set importance value corresponding to the sample set; Yis an original importance value corresponding to the sample set; and ais a sampling rate of the sample set.

For example, as described in the example above, the sampling rate forthe sampling time period from January 24 to January 26 is 0.1, and theimportance value corresponding to the set is set to 0.2; the samplingrate for the sampling time period of January 29 is 0.5, and theimportance value corresponding to the set is set to 1; and the samplingrate for the sampling time period from January 27 to January 28 is 1,and the importance value corresponding to the set is set to 5. Then,according to Y1=Y/a, it can be obtained that importance values of thethree sets arranged according to the sampling time periodschronologically are 2, 2, and 5 respectively.

For example, step S204 may further include sub-step S204 b as follows.

In sub-step S204 b, the importance value of the sample set correspondingto the latest sampling time period is increased according to a presetrule.

In sub-step S204 b, for example, the preset rule may include increasingthe importance value of the sample set corresponding to the latestsampling time period so that it can be directly proportional to theimportance value of the sample set corresponding to the latest samplingtime period before the increase and directly proportional to the totalnumber of the sample sets.

In this sub-step, for example, the importance value of the sample setcorresponding to the latest sampling time period may be re-set accordingto the following formula:Z1=Z*b;wherein Z1 is a re-modified importance value corresponding to the sampleset; Z is an initially modified importance value corresponding to thesample set; and b is the total number of the sample sets.

For example, it can be obtained according to sub-step S204 b that theimportance values of the three sets arranged according to the samplingtime periods chronologically are 2, 2, and 5 respectively. Here, theimportance value of the sampled sample set having the latest samplingtime period, i.e., the third sample set, may be further increased.

For example, the importance value of the sample set corresponding to thelatest sampling time period may be re-set according to the followingformula:Z1=Z*b;wherein Z1 is a re-set importance value corresponding to the sample set;Z is an initially set importance value corresponding to the sample set;and b is the total number of the sample sets.

For example, the initially set importance value corresponding to thesample set having the latest sampling time period obtained in sub-stepS204 a is 5. In this sub-step, according to the formula Z1=Z*b, there-set importance value 5*3=15 may be acquired.

It is noted that, sub-step S204 b may be performed before or aftersub-step S204 a, or the two sub-steps may be performed independently.For example, sub-step S204 b can be independent of sub-step S204 a andperformed without sub-step 204 a.

In step S205, pieces of sample data in the multiple sampled sample setsare corrected by using the importance values to obtain corrected sampledata.

For example, this step may include sub-step S205 a as follows.

In sub-step S205 a, each of the importance values is multiplied by eachpiece of sample data in a sampled sample set corresponding to theimportant value to obtain corrected sample data.

In step S206, each piece of the corrected sample data is input into themachine learning system to train the machine learning system.

The step may be identical or similar to step S106 described above, andis not repeated in detail here.

In summary, these embodiments of the present disclosure as exemplifiedin FIG. 2 disclose a method for training a machine learning system. Thesample data is processed before being input into the machine learningsystem, and importance values of different sample sets are set.Therefore, the adoption rate and the utilization of important data canbe improved while the data amount is reduced. The impact on the learningperformance of the machine learning system can be reduced while thedemand for memory resources is lessened.

According to some embodiments of the present disclosure, there isprovided yet another method for training a machine learning system. FIG.3 shows a flowchart of an exemplary method for training a machinelearning system according to some embodiments of the present disclosure.The method for training a machine learning system provided in theseembodiments can include steps S301-S306 c as follows.

In step S301, multiple sample sets are acquired. Each sample setincludes sample data corresponding to a sampling time period.

In step S302, a sampling rate for a sample set is set according to thesampling time period corresponding to the sample set.

In step S303, multiple sample sets sampled according to sampling ratesare acquired.

In step S304, importance values of the multiple sampled sample sets aredetermined respectively.

In step S305, pieces of sample data in the multiple sampled sample setsare corrected by using the importance values to obtain corrected sampledata.

The above steps S301 to S305 may be identical or similar to steps S101to S105 or steps S201 to S205 described above, and are not repeated indetail here.

As shown in FIG. 3 , these embodiments may further include the followingstep.

In step S306, each piece of the corrected sample data is input into themachine learning system to train the machine learning system. In thisstep, the corrected sample data may be input into the machine learningsystem to train the machine learning system. During training, thederivative of a loss function is taken to calculate a gradient, and aweight close to the optimal solution may be calculated through iterationbased on the gradient, an initial weight, and a set step lengthaccording to the formula “new weight=old weight+step length*gradient.”

This step may include sub-steps S306 a-S306 c as follows.

In sub-step S306 a, a gradient of each piece of the corrected sampledata is calculated. The gradient of each piece of corrected sample datamay be calculated. The gradient may be the derivative of a lossfunction, and the gradient may be obtained by taking the derivative ofthe loss function.

In sub-step S306 b, a precision of the gradient of each piece of thesample data is reduced.

In this sub-step, the machine learning system is generally trained byusing a gradient descent method, and a gradient of each machine needs tobe calculated. If 8 bytes are required to store 1 gradient, a storagespace of 10,000,000,000*8/1024/1024/1024=74.5 GB is needed for storing10 billion gradients. If the number of bytes for storing one gradient iscompressed to 4 bytes, a memory of 32.25 GB is needed for storing 10billion gradients.

The number of bytes for storing the gradient of each piece of sampledata may be reduced by using the following formula to reduce theprecision:X1=floor(c*X+(rand( ))/d)/c;wherein floor is rounded down; rand( ) is to generate a floating numberbetween 0-d; X1 is a low-precision floating number, for example, a4-byte float to be stored by the computer, which can represent bytes forstoring the gradient of each piece of the sample data after reduction;and X is a high-precision floating number, for example, a 8-byte doubleto be stored by the computer, which can represent bytes for storing thegradient of each piece of the sample data before reduction.

In addition, a rand function is used to introduce a random factor toreduce a cumulative error of the floating number. For example, analgorithm of (c*X+(rand( ))/d) is utilized, wherein X is multiplied by afixed number c and is added with a floating number in a range of 0-1 tointroduce a random factor. The value of c is an empirical value, such as536,870,912. The value of d may be, for example, 2³¹−1, i.e.,2,147,483,647, which is an upper limit that can be generated by the randfunction.

By using the foregoing formula, a high-precision floating number may beconverted to a low-precision floating number, and the cumulative errormay be reduced.

In sub-step S306 c, gradients having precisions that have been reducedare input into the machine learning system to train the machine learningsystem.

In summary, these embodiments of the present disclosure as exemplifiedin FIG. 3 disclose a method for training a machine learning system. Thesample data is processed before being input into the machine learningsystem. Importance values of different sample sets are set, and thegradient precision is reduced. Therefore, the adoption rate and theutilization of important data can be improved while the data amount isreduced. The impact on the learning performance of the machine learningsystem can be reduced while the demand for memory resources is lessened.

According some embodiments of the present disclosure, there is provideda system for training a machine learning system. FIG. 4 shows a blockdiagram of an exemplary system for training a machine learning systemaccording to some embodiments of the present disclosure. The system fortraining a machine learning system provided in these embodiments trainsthe machine learning system by using multiple pieces of sample data. Asshown in FIG. 4 , embodiments of the present disclosure provide atraining system 400. Training system 400 can include a first acquisitionmodule 401 configured to acquire multiple sample sets. Each sample setincludes sample data corresponding to a sampling time period. Trainingsystem 400 can include a sampling rate setting module 402 configured toset a sampling rate for a sample set according to the sampling timeperiod corresponding to the sample set. Training system 400 can includea second acquisition module 403 configured to acquire multiple samplesets sampled according to sampling rates. Training system 400 caninclude an importance value determination module 404 configured torespectively set importance values of the multiple sampled sample sets.Training system 400 can include a sample data correction module 405configured to correct each piece of sample data in the multiple sampledsample sets by using its corresponding importance value to obtaincorrected sample data. Training system 400 can include a training module406 configured to input each piece of the corrected sample data into themachine learning system to train the machine learning system.

In these embodiments, the sampling rate is set higher for a sample setcorresponding to a later sampling time period than the sampling rate ofa sample set corresponding to an earlier sampling time period.

In summary, these embodiments of the present disclosure disclose asystem for training a machine learning system. The sample data isprocessed before being input into the machine learning system. Theadoption rate and the utilization of important data can be improvedwhile the data amount is reduced. Thus, the impact on the learningperformance of the machine learning system can be reduced while thedemand for memory resources is lessened.

According to some embodiments of the present disclosure, there isprovided another system for training a machine learning system. FIG. 5shows a block diagram of an exemplary system for training a machinelearning system according to some embodiments of the present disclosure.The system for training a machine learning system provided in theseembodiments trains the machine learning system by using multiple piecesof sample data. As shown in FIG. 5 , embodiments of the presentdisclosure provide a training system 500. Training system 500 caninclude a first acquisition module 501 configured to acquire multiplesample sets. Each sample set includes sample data corresponding to asampling time period. Training system 500 can include a sampling ratesetting module 502 configured to set a sampling rate for a sample setaccording to the sampling time period corresponding to the sample set.Training system 500 can include a second acquisition module 503configured to acquire multiple sample sets sampled according to samplingrates. Training system 500 can include an importance value determinationmodule 504 configured to respectively set importance values of themultiple sampled sample sets. Training system 500 can include a sampledata correction module 505 configured to correct each piece of sampledata in the multiple sampled sample sets by using its correspondingimportance value to obtain corrected sample data. Training system 500can include a training module 506 configured to input each piece of thecorrected sample data into the machine learning system to train themachine learning system.

In these embodiments, sample data correction module 505 can beconfigured to multiply each of the importance values by each piece ofsample data in a sampled sample set corresponding to the importancevalue to obtain corrected sample data.

In these embodiments, importance value determination module 504 caninclude a primary correction sub-module 504 a configured to correctinitial importance values of the sampled sample sets based oncorresponding sampling rates to obtain corrected or modified importancevalues of the sampled sample sets. A corrected or modified importancevalue can be directly proportional to its corresponding initialimportance value, and inversely proportional to the sampling rate of thesampled sample set.

For example, the primary correction sub-module may set an importancevalue of each sample set primarily according to the following formula:Y1=Y/a;wherein Y1 is a set importance value corresponding to the sample set; Yis an original importance value set corresponding to the sample set; anda is a sampling rate of the sample set.

In these embodiments, importance value determination module 504 mayfurther include a secondary correction sub-module 504 b configured toincrease the importance value of the sample set corresponding to thelatest sampling time period according to a preset rule.

The preset rule can include increasing the importance value of thesample set corresponding to the latest sampling time period so that itcan be directly proportional to the importance value of the sample setcorresponding to the latest sampling time period before increase anddirectly proportional to the total number of the sample sets.

For example, the importance value of the sample set corresponding to thelatest sampling time period may be re-set according to the followingformula:Z1=Z*b;wherein Z1 is a re-set importance value set corresponding to the sampleset; Z is an initially set importance value corresponding to the sampleset; and b is the total number of the sample sets.

In these embodiments, the sampling rate is set higher for a sample setcorresponding to a later sampling time period than the sampling rate ofa sample set corresponding to an earlier sampling time period.

In summary, these embodiments of the present disclosure disclose asystem for training a machine learning system. The sample data isprocessed before being input into the machine learning system, andimportance values of different sample sets are set. Therefore, theadoption rate and the utilization of important data can be improvedwhile the data amount is reduced. The impact on the learning performanceof the machine learning system can be reduced while the demand formemory resources is lessened.

According to some embodiments of the present disclosure, there isprovided yet another system for training a machine learning system. FIG.6 shows a block diagram of an exemplary system for training a machinelearning system according to some embodiments of the present disclosure.The system for training a machine learning system provided in theseembodiments trains the machine learning system by using multiple piecesof sample data. As shown in FIG. 6 , embodiments of the presentdisclosure provide a training system 600. Training system 600 caninclude a first acquisition module 601 configured to acquire multiplesample sets. Each sample set includes sample data corresponding to asampling time period. Training system 600 can include a sampling ratesetting module 602 configured to set a sampling rate for a sample setaccording to the sampling time period corresponding to the sample set.Training system 600 can include a second acquisition module 603configured to acquire multiple sample sets sampled according to samplingrates. Training system 600 can include an importance value determinationmodule 604 configured to respectively set importance values of themultiple sampled sample sets. Training system 600 can include a sampledata correction module 605 configured to correct each piece of sampledata in the multiple sampled sample sets by using its correspondingimportance value to obtain corrected sample data. Training system 600can include a training module 606 configured to input each piece of thecorrected sample data into the machine learning system to train themachine learning system.

In these embodiments, training module 606 can include a calculationsub-module 606 a configured to calculate a gradient of each piece of thecorrected sample data. Training module 606 can include a precisionreduction sub-module 606 b configured to reduce the precision of each ofthe gradients. Training module 606 can include a training sub-module 606c configured to input the gradients having precisions that have beenreduced into the machine learning system to train the machine learningsystem.

In these embodiments, precision reduction sub-module 606 b can beconfigured to reduce bytes for storing each gradient by using thefollowing formula to reduce the precision:X1=floor(c*X+(rand( ))/d)/cwherein floor is rounded down; rand ( ) is to generate a floating numberbetween 0-d; X1 is the number of bytes for storage after reduction; andX is the number of bytes for storage before reduction.

In summary, these embodiments of the present disclosure disclose asystem for training a machine learning system. The sample data isprocessed before being input into the machine learning system.Importance values of different sample sets are set, and the gradientprecision is reduced. Therefore, the adoption rate and the utilizationof important data can be guaranteed while the data amount is reduced.The impact on the learning performance of the machine learning systemcan be reduced while the demand for memory resources is lessened.

The apparatus embodiments provide functionality that is basicallysimilar to the functionality provided by the method embodiments, so thatthey are described briefly. Reference may be made to the descriptions ofthe relevant parts in the method embodiments.

The embodiments of this disclosure are all described in a progressivemanner Each embodiment emphasizes a difference from other embodiments,and identical or similar parts in the embodiments may be obtained fromeach other.

Those skilled in the art should understand that the embodiments of thepresent disclosure may be provided as a method, an apparatus, or acomputer program product. Therefore, the embodiments of the presentdisclosure may be implemented as a complete hardware embodiment, acomplete software embodiment, or an embodiment combining software andhardware. Moreover, the embodiments of the present disclosure may be acomputer program product implemented on one or more computer usablestorage media (including, but not limited to, a magnetic disk memory, aCD-ROM, an optical memory, and the like) including computer usableprogram codes.

In a typical configuration, the computer device includes one or moreprocessors (CPU), an input/output interface, a network interface, and amemory. The memory may include a volatile memory, a random access memory(RAM) and/or a non-volatile memory or the like in a computer readablemedium, for example, a read only memory (ROM) or a flash RAM. The memoryis an example of the computer readable medium. The computer readablemedium includes non-volatile and volatile media as well as movable andnon-movable media, and can implement information storage by means of anymethod or technology. A storage medium of a computer includes, but isnot limited to, for example, a phase change memory (PRAM), a staticrandom access memory (SRAM), a dynamic random access memory (DRAM),other types of RAMs, a ROM, an electrically erasable programmableread-only memory (EEPROM), a flash memory or other memory technologies,a compact disk read only memory (CD-ROM), a digital versatile disc (DVD)or other optical storages, a cassette tape, a magnetic tape/magneticdisk storage or other magnetic storage devices, or any othernon-transmission medium, and can be used to store signals accessible tothe computing device. According to the definition of this text, thecomputer readable medium does not include transitory media, such as amodulated data signal and a carrier.

The embodiments of the present disclosure are described with referenceto flowcharts and/or block diagrams according to the method, terminaldevice (system) and computer program product according to theembodiments of the present disclosure. It should be understood that acomputer program instruction may be used to implement each processand/or block in the flowcharts and/or block diagrams and combinations ofprocesses and/or blocks in the flowcharts and/or block diagrams. Thecomputer program instructions may be provided to a computer, an embeddedprocessor, or another programmable data processing terminal device togenerate a machine, such that the computer or a processor of anotherprogrammable data processing terminal device executes an instruction togenerate an apparatus configured to implement functions designated inone or more processes in a flowchart and/or one or more blocks in ablock diagram.

The computer program instructions may also be stored in a computerreadable storage capable of guiding a computer or another programmabledata processing terminal device to work in a specific manner, such thatthe instructions stored in the computer readable storage generates anarticle of manufacture including an instruction apparatus, and theinstruction apparatus implements functions designated by one or moreprocesses in a flowchart and/or one or more blocks in a block diagram.

The computer program instructions may also be loaded in a computer oranother programmable data processing terminal device, such that a seriesof operation steps are executed on the computer or another programmableterminal device to generate a computer implemented processing.Therefore, the instructions executed on the computer or anotherprogrammable terminal device provides steps for implementing functionsdesignated in one or more processes in a flowchart and/or one or moreblocks in a block diagram.

Embodiments of the present disclosure have been described; however, onceknowing the basic creative concepts, those skilled in the art can makeother variations and modifications to the embodiments. Therefore, theappended claims are intended to be explained as including theembodiments described herein and all variations and modificationsfalling within the scope of the embodiments of the present disclosure.

Finally, it should be further noted that the relation terms in this textsuch as “first” and “second” are merely used to distinguish one entityor operation from another entity or operation, and do not require orimply that the entities or operations have this actual relation ororder. Moreover, the term “include,” “comprise” or other variationsthereof is intended to cover non-exclusive inclusion, so that a process,method, article or terminal device including a series of elements notonly includes the elements, but also includes other elements not clearlylisted, or further includes inherent elements of the process, method,article or terminal device. In the absence of more limitations, anelement defined by “including a(n) . . . ” does not exclude that theprocess, method, article or terminal device including the elementfurther has other identical elements.

The above descriptions of the embodiments are merely used to helpunderstand the methods and systems of the present disclosure and itscore ideas. Meanwhile, for those of ordinary skill in the art, there maybe modifications to the specific implementation manners and applicationscopes according to the idea of the present disclosure. Therefore, thecontent of the specification should not be construed as limiting thepresent disclosure.

The invention claimed is:
 1. A method for training a machine learningsystem using multiple pieces of sample data, the method comprising:acquiring multiple sample sets, each sample set of the multiple samplesets comprising sample data in a corresponding sampling time period;setting a sampling rate for each sample set according to thecorresponding sampling time period; acquiring multiple sample setssampled according to set sampling rates; determining importance valuesof the multiple sampled sample sets, comprising: correcting initialimportance values of the sampled sample sets based on the sampling ratescorresponding to the sampled sample sets to obtain corrected importancevalues of the sampled sample sets, comprising increasing the initialimportance value of the sample set corresponding to a latest samplingtime period according to a preset rule, wherein the preset rulecomprises: the initial importance value of the sample set correspondingto the latest sampling time period is increased, such that the increasedimportance value is proportional to the initial importance value of thesample set corresponding to the latest sampling time period and isproportional to the total number of the sample sets; wherein, for eachsampled sample set, the corrected importance value is proportional tothe initial importance value, and is inversely proportional to thesampling rate of the sampled sample set; correcting each piece of sampledata in the multiple sampled sample sets by using a correspondingimportance value to obtain corrected sample data; and inputting eachpiece of the corrected sample data into the machine learning system totrain the machine learning system.
 2. The method according to claim 1,wherein correcting each piece of sample data in the multiple sampledsample sets comprises: multiplying each of the importance values by eachpiece of sample data in the sampled sample set corresponding to eachimportance value to obtain corrected sample data.
 3. The methodaccording to claim 1, wherein inputting each piece of the correctedsample data into the machine learning system to train the machinelearning system comprises: determining a gradient of each piece of thecorrected sample data; reducing a precision of each of the gradients;and inputting the gradients having reduced precisions into the machinelearning system to train the machine learning system.
 4. The methodaccording to claim 3, wherein reducing the precision of each of thegradients comprises: reducing bytes for storing each gradient by usingthe following formula to reduce the precision:X1=floor(c*X+(rand( ))/d)/c, wherein floor is rounded down; rand ( ) isto generate a floating number between 0-d; X1 is the number of bytes forstorage after reduction; and X is the number of bytes for storage beforereduction.
 5. A system for training a machine learning system usingmultiple pieces of sample data, the system comprising: one or morememories configured to store executable program code; and one or moreprocessors configured to read the executable program code stored in theone or more memories to cause the system to perform: acquiring multiplesample sets, each sample set of the multiple sample sets comprisingsample data in a corresponding sampling time period setting a samplingrate for each sample set according to the corresponding sampling timeperiod; acquiring multiple sample sets sampled according to set samplingrates; determining importance values of the multiple sampled samplesets, comprising: correcting initial importance values of the sampledsample sets based on the sampling rates corresponding to the sampledsample sets to obtain corrected importance values of the sampled samplesets, comprising increasing the initial importance value of the sampleset corresponding to a latest sampling time period according to a presetrule, wherein the preset rule comprises: the initial importance value ofthe sample set corresponding to the latest sampling time period isincreased, such that the increased importance value is proportional tothe initial importance value of the sample set corresponding to thelatest sampling time period and is proportional to the total number ofthe sample sets; wherein, for each sampled sample set, the correctedimportance value is proportional to the initial importance value, and isinversely proportional to the sampling rate of the sampled sample set;correcting each piece of sample data in the multiple sampled sample setsby using a corresponding importance value to obtain corrected sampledata; and inputting each piece of the corrected sample data into themachine learning system to train the machine learning system.
 6. Thesystem according to claim 5, wherein the one or more processors areconfigured to read the executable program code to cause the system toperform the following to correct each piece of sample data in themultiple sampled sample sets: multiplying each of the importance valuesby each piece of sample data in the sampled sample set corresponding toeach importance value to obtain corrected sample data.
 7. The systemaccording to claim 5, wherein the one or more processors are configuredto read the executable program code to cause the system to perform thefollowing to input each piece of the corrected sample data into themachine learning system to train the machine learning system:determining a gradient of each piece of the corrected sample data;reducing a precision of each of the gradients; and inputting thegradients whose precision has been reduced into the machine learningsystem to train the machine learning system.
 8. The system according toclaim 7, wherein the one or more processors are configured to read theexecutable program code to cause the system to perform the following toreduce the precision of each of the gradients: reducing bytes forstoring each gradient by using the following formula to reduce theprecision:X1=floor(c*X+(rand( ))/d)/c, wherein floor is rounded down; rand ( ) isto generate a floating number between 0-d; X1 is the number of bytes forstorage after reduction; and X is the number of bytes for storage beforereduction.
 9. A non-transitory computer-readable storage medium storinga set of instructions that is executable by one or more processors of anelectronic device to cause the electronic device to perform a methodcomprising: acquiring multiple sample sets, each sample set of themultiple sample sets comprising sample data in a corresponding samplingtime period; setting a sampling rate for each sample set according tothe corresponding sampling time period; acquiring multiple sample setssampled according to set sampling rates; determining importance valuesof the multiple sampled sample sets, comprising: correcting initialimportance values of the sampled sample sets based on the sampling ratescorresponding to the sampled sample sets to obtain corrected importancevalues of the sampled sample sets, comprising increasing the initialimportance value of the sample set corresponding to a latest samplingtime period according to a preset rule, wherein the preset rulecomprises: the initial importance value of the sample set correspondingto the latest sampling time period is increased, such that the increasedimportance value is proportional to the initial importance value of thesample set corresponding to the latest sampling time period and isproportional to the total number of the sample sets; wherein, for eachsampled sample set, the corrected importance value is proportional tothe initial importance value, and is inversely proportional to thesampling rate of the sampled sample set; correcting each piece of sampledata in the multiple sampled sample sets by using a correspondingimportance value to obtain corrected sample data; and inputting eachpiece of the corrected sample data into a machine learning system totrain the machine learning system.
 10. The non-transitorycomputer-readable storage medium of claim 9, wherein the set ofinstructions that is executable by the one or more processors of theelectronic device causes the electronic device to perform the followingto correct each piece of sample data in the multiple sampled samplesets: multiplying each of the importance values by each piece of sampledata in the sampled sample set corresponding to each importance value toobtain corrected sample data.
 11. The non-transitory computer-readablestorage medium of claim 9, wherein the set of instructions that isexecutable by the one or more processors of the electronic device causesthe electronic device to perform the following to input each piece ofthe corrected sample data into the machine learning system to train themachine learning system: determining a gradient of each piece of thecorrected sample data; reducing a precision of each of the gradients;and inputting the gradients having reduced precisions into the machinelearning system to train the machine learning system.
 12. Thenon-transitory computer-readable storage medium of claim 11, wherein theset of instructions that is executable by the one or more processors ofthe electronic device causes the electronic device to perform thefollowing to reduce the precision of each of the gradients: reducingbytes for storing each gradient by using the following formula to reducethe precision:X1=floor(c*X+(rand( ))/d)/c, wherein floor is rounded down; rand ( ) isto generate a floating number between 0-d; X1 is the number of bytes forstorage after reduction; and X is the number of bytes for storage beforereduction.